Investigation of Distributed Search Engine Based on Hadoop
نویسنده
چکیده
This paper begins with a review on the research status of search engine, followed by discussion on goals of search engine, and then the principle of distributed computing is explained. Consequently the MapReduce distributed computing model and the Hadoop distributed file system (HDFS) are analyzed in detail. Finally the distributed search engine architecture is presented. On the basis of the architecture, future challenges and opportunities of the distributed search engine are highlighted.
منابع مشابه
The efficient implementation of distributed indexing with Hadoop for digital investigations on Big Data
Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods for data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, the development of a valid indexing method for efficient search should come first to more quickly and accurat...
متن کاملThe Implementation of Hadoop-based Crawler System and Graphlite-based PageRank-Calculation In Search Engine
Nowadays, the size of the Internet is experiencing rapid growth. As of December 2014, the number of global Internet websites has more than 1 billion and all kinds of information resources are integrated together on the Internet , however,the search engine is to be a necessary tool for all users to retrieve useful information from vast amounts of web data. Generally speaking, a complete search e...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملImproving search results in life science by recommendations based on semantic information
The management and handling of big data is a major challenge in the area of life science. Beside the data storage, information retrieval methods have to be adapted to huge data amounts as well. Therefore we present an approach to improve search results in life science by recommendations based on semantic information. In detail we determine relationships between documents by searching for shared...
متن کاملA New Model of Search Engine based on Cloud Computing
With the rapid increase of websites and internet users, the traditional search engine will face great challenge in the real-time search, response speed and the storage of mass pages. However, the search engine deployed in the cloud can solve these shortcomings due to cloud computing with two major advantages in mass data processing and mass data storage. By analyzing the open-source cloud compu...
متن کامل